首页> 外文OA文献 >Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
【2h】

Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

机译:通过哈希来预测处理多连接查询的成本   主存数据库(扩展版)

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Database management systems (DBMSs) carefully optimize complex multi-joinqueries to avoid expensive disk I/O. As servers today feature tens or hundredsof gigabytes of RAM, a significant fraction of many analytic databases becomesmemory-resident. Even after careful tuning for an in-memory environment, alinear disk I/O model such as the one implemented in PostgreSQL may make queryresponse time predictions that are up to 2X slower than the optimal multi-joinquery plan over memory-resident data. This paper introduces a memory I/O costmodel to identify good evaluation strategies for complex query plans withmultiple hash-based equi-joins over memory-resident data. The proposed costmodel is carefully validated for accuracy using three different systems,including an Amazon EC2 instance, to control for hardware-specific differences.Prior work in parallel query evaluation has advocated right-deep and bushytrees for multi-join queries due to their greater parallelization andpipelining potential. A surprising finding is that the conventional wisdom fromshared-nothing disk-based systems does not directly apply to the modernshared-everything memory hierarchy. As corroborated by our model, theperformance gap between the optimal left-deep and right-deep query plan cangrow to about 10X as the number of joins in the query increases.
机译:数据库管理系统(DBMS)会仔细优化复杂的多连接,以避免昂贵的磁盘I / O。如今,由于服务器具有数十或数百GB的RAM,因此许多分析数据库中的很大一部分都驻留在内存中。即使在针对内存环境进行了仔细调整之后,诸如PostgreSQL中实现的线性磁盘I / O模型也可能会使查询响应时间预测比内存驻留数据的最佳多联接查询计划慢2倍。本文介绍了一种内存I / O成本模型,用于针对内存驻留数据上具有多个基于散列的等联接的复杂查询计划,确定良好的评估策略。我们使用三种不同的系统(包括Amazon EC2实例)仔细控制了提议的成本模型的准确性,以控制特定于硬件的差异。在并行查询评估中,由于并行化程度更高,因此在并行查询评估的早期工作中提倡使用右深树和丛树进行多连接查询和流水线的潜力。一个令人惊讶的发现是,基于无共享磁盘的系统的传统知识并未直接应用于现代共享所有内存层次结构。正如我们的模型所证实的,随着查询中联接数的增加,最佳左深度查询和右深度查询计划之间的性能差距可能会扩大到大约10倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号